Skip to content

feat: enhanced scientific RAG pipeline for research workflows (ISAAC-497)#23

Open
watcharaponthod-code wants to merge 1 commit into
aietal:masterfrom
watcharaponthod-code:feat/enhanced-scientific-rag-isaac497
Open

feat: enhanced scientific RAG pipeline for research workflows (ISAAC-497)#23
watcharaponthod-code wants to merge 1 commit into
aietal:masterfrom
watcharaponthod-code:feat/enhanced-scientific-rag-isaac497

Conversation

@watcharaponthod-code
Copy link
Copy Markdown

Bounty: ISAAC-497
Algora bounty: https://algora.io/isaac/bounties/clq18zr98000ejs0gt0nv7gwu

Summary

This PR implements an enhanced RAG pipeline for scientific and research document workflows. Five files changed, adding a dedicated utility module with 30 fully-tested helper functions.

Key improvements

Section-aware chunking: SCIENTIFIC_SEPARATORS splits documents at Abstract/Methods/Results boundaries. Every chunk stores citationKey, section, and sectionWeight in ChromaDB.

Multi-query retrieval with RRF + section weighting: buildResearchQueries expands the user query into 4 deterministic variants. fuseQueryResults applies Reciprocal Rank Fusion with section importance weights (abstract 1.4x, results 1.3x, methods 1.2x, body 0.8x). Duplicate chunks accumulate scores across query variants.

Stable citation keys: buildCitationKey produces deterministic title-slug:pPage:cChunk+1 keys. buildChunkMetadata strips server-side temp upload paths from the public source field.

Scientific chat prompt: system prompt updated to strict research assistant persona - cite every claim by key, prefer Results/Methods evidence over Introduction/Discussion. fetchResearchEvidence uses same-origin URL instead of hard-coded localhost:3000. Temperature taken from request instead of hard-coded 0.

Validation

npx vitest run - 30/30 tests passed
npx tsc --noEmit - 0 type errors

Payout: Algora bounty-platform payout to GitHub user @watcharaponthod-code.

…497)

Implement section-aware document ingestion, multi-query retrieval with
reciprocal rank fusion, stable citation keys, and budget-capped evidence
context for the scientific/research RAG workflow.

Changes:
- Add ui/utils/server/scientific-rag.ts: core RAG utilities
  * detectScientificSection: identifies abstract/methods/results/etc from chunk text
  * sectionWeight: importance weights (abstract 1.4x, results 1.3x, methods 1.2x...)
  * buildChunkMetadata: typed metadata with stable citationKey, strips temp paths
  * buildResearchQueries: expands query into 4 deterministic variants for recall
  * fuseQueryResults: RRF + section-weighted deduplication across query result sets
  * buildEvidencePayload: budget-capped evidence context + source manifest
  * parseBoundedInteger: safe integer parsing for API params
  * SCIENTIFIC_SEPARATORS: section-heading-first text splitter separators
- Update ui/pages/api/inject-documents.ts:
  * Use SCIENTIFIC_SEPARATORS for section-aligned chunking (900 char chunks)
  * Replace processDocuments with buildChunkMetadata for typed, safe metadata
  * Store citationKey, section, sectionWeight in ChromaDB for downstream ranking
- Update ui/pages/api/fetch-documents.ts:
  * Expand query with buildResearchQueries before Chroma lookup
  * Apply fuseQueryResults RRF + section-weight fusion across all variants
  * Return structured evidence payload instead of raw Chroma response
- Update ui/pages/api/rag-chat.ts:
  * Use same-origin URL for fetch-documents (works in any deployment)
  * Scientific research assistant system prompt with strict citation rules
  * Use temperature from request instead of hard-coded 0
  * Section-prioritised citation rules in prompt (prefer Results > Methods > Abstract)
- Add ui/__tests__/scientific-rag.test.ts: 30 tests covering all public helpers

Validation:
- npx vitest run (30/30 passed)
- npx tsc --noEmit (0 errors)

Bounty: ISAAC-497
Algora bounty: https://algora.io/isaac/bounties/clq18zr98000ejs0gt0nv7gwu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant